Creating a Research Collection of Question Answer Sentence Pairs with Amazon's Mechanical Turk

نویسندگان

  • Michael Kaisser
  • John Lowe
چکیده

Each year NIST releases a set of question, document id, answer-triples for the factoid questions used in the TREC Question Answering track. While this resource is widely used and proved itself useful for many purposes, it also is too coarse a grain-size for a lot of other purposes. In this paper we describe how we have used Amazon’s Mechanical Turk to have multiple subjects read the documents and identify the sentences themselves which contain the answer. For most of the 1911 questions in the test sets from 2002 to 2006 and each of the documents said to contain an answer, the Question-Answer Sentence Pairs (QASP) corpus introduced in this paper contains the identified answer sentences. We believe that this corpus, which we will make available to the public, can further stimulate research in QA, especially linguistically motivated research, where matching the question to the answer sentence by either syntactic or semantic means is a central concern.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paragraph Acquisition and Selection for List Question Using Amazon's Mechanical Turk

Creating more fine-grained annotated data than previously relevent document sets is important for evaluating individual components in automatic question answering systems. In this paper, we describe using the Amazon’s Mechanical Turk (AMT) to judge whether paragraphs in relevant documents answer corresponding list questions in TREC QA track 2004. Based on AMT results, we build a collection of 1...

متن کامل

Collecting Image Annotations Using Amazon's Mechanical Turk

Crowd-sourcing approaches such as Amazon’s Mechanical Turk (MTurk) make it possible to annotate or collect large amounts of linguistic data at a relatively low cost and high speed. However, MTurk offers only limited control over who is allowed to particpate in a particular task. This is particularly problematic for tasks requiring free-form text entry. Unlike multiple-choice tasks there is no c...

متن کامل

Document Image Collection Using Amazon's Mechanical Turk

We present findings from a collaborative effort aimed at testing the feasibility of using Amazon’s Mechanical Turk as a data collection platform to build a corpus of document images. Experimental design and implementation workflow are described. Preliminary findings and directions for future work are also discussed.

متن کامل

Creating Speech and Language Data With Amazon's Mechanical Turk

In this paper we give an introduction to using Amazon’s Mechanical Turk crowdsourcing platform for the purpose of collecting data for human language technologies. We survey the papers published in the NAACL2010 Workshop. 24 researchers participated in the workshop’s shared task to create data for speech and language applications with $100.

متن کامل

Time-Efficient Creation of an Accurate Sentence Fusion Corpus

Sentence fusion enables summarization and question-answering systems to produce output by combining fully formed phrases from different sentences. Yet there is little data that can be used to develop and evaluate fusion techniques. In this paper, we present a methodology for collecting fusions of similar sentence pairs using Amazon’s Mechanical Turk, selecting the input pairs in a semiautomated...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008